y
data are in the format of peptides. A peptide is composed of non-
l attributes such as the amino acids or the nucleic acids. A proper
of these non-numerical data called the encoding process must be
ed because not all the machine learning algorithms can accept
erical inputs. In addition to the commonly used binary encoding,
ter has introduced an alternative encoding approach which has
l significance. This novel encoding method is called the bio-basis
This alternative does not treat the residues of a peptide as the
ent variables. Just like the move from the 1980s’ edit distance-
quence homology alignment to the 1990s’ mutation-based
y alignment, the introduction of the bio-basis function has
he binary encoding method. When being integrated with different
ant analysis algorithms, the bio-basis function can be efficiently
a better protease cleavage pattern discovery as shown in several
his chapter. Importantly, they show different outstanding features
ase cleavage pattern discovery. For instance, the mixture bio-
nefits the use of multiple mutation matrices, leading to the
d performance in peptide cleavage pattern discovery. Through the
on between the bio-basis function with the random forest
m, the cleaved peptides which are most close to the probable
e can be discovered. This thus provides some important
on for the efficient inhibitor design. Besides, this cutting-edge
can be well used for similar biological/medical pattern discovery